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ABSTRACT 


Tills  document  presents  the  preliminary  specifications  for  the  design 
and  implementation  of  an  Information  Management  System,  capable  of  being  in- 
tegrated with  an  extensive  Mathematical  Analysis  System  to  be  implemented  on 
the  ILLEAC  IV  computer.   The  system  would  provide  multiple  keyed  access  to 
several  large  data  bases  and  the  capability  of  receiving  instructions  and 
questions  expressed  in  a  simple  inquiry  language.   Mathematical  and  statis- 
tical routines  will  be  interfaced  through  the  Analysis  System.   There  will  also 
be  a  report  generator  which  displays  the  responses  to  questions  in  forms 
easily  interpretable  by  researchers. 
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I.   The  Information  Management  System  as  an  Information  Management  Machine 

A.   Introduction 

This  is  a  proposal  for  providing  an  Information  Management  System 
on  the  ILLIAC  IV  computer.   The  system  will  interpret  programmable  queries 
to  determine  which  records  of  which  files  must  be  retrieved.   From  these 
records,  data  will  be  extracted  and  operated  upon  by  statistical  or  other 
mathematical  routines  according  to  the  researcher's  request.   The  responses 
to  these  requests  will  then  be  translated  and  output  to  the  user  in  an 
easily  readable  form.   The  user  can  interpret  the  results  and  formulate  new 
questions  to  be  asked  of  the  system.   All  the  conventional  data  management 
functions  such  as  sorting,  merging,  and  building  files  will  also  be  provided. 

It  may  be  instructive  to  the  reader  to  view  this  system  as  a  con- 
ventional special  purpose  micro -programmable  computer  designed  only  to  per- 
form the  tasks  of  information  management  and  processing.   Whereas,  a  con- 
ventional computer  operates  on  words  of  data,  this  "machine",  referred  to  as 
the  Information  Management  Machine  (IMM),  will  operate  on  values  of  elemen- 
tary data  items.   An  elementary  data  item  is  an  alpha-numeric  name  of  a  non- 
divisible  piece  of  data,  e.g.,  SOIL_IYPE  or  EMPLOYEE_NO.   An  additional  dif- 
ference is  that  conventional  machines  use  words  of  fixed  size,  but  the  IMM 
uses  variable  length  elementary  data  items.   The  analogy  is  diagrammed  in 
Figure  1.   The  IMM  will  be  simulated  on  the  ILLIAC  IV  computer  system  and 
will  be  known  as  the  ILLIAC  IV  Information  Management  System  (IMS). 

The  IMS  is  composed  of  four  main  interacting  modules:   the  Data 
Management  System  (DMS),  the  Symbolic  Data  Structuring  System  (SDSS),  the 
Information  Retrieval  System  (IRS),  and  the  Mathematical  Computation  System 
(MCS).   Figure  1  is  also  a  diagram  of  the  control  and  information  flow  be- 
tween these  various  modules  of  the  IMS.   Each  module  essentially  performs 
the  same  functions  as  its  counter -part  in  the  conventional  machine.   Note 
that  the  micro-programmable  computer  can  change  its  programming  semantics 
without  changing  the  instruction  stream.   Similarly,  the  Information  Manage- 
ment Machine  can  change  its  function  without  altering  the  languages  used 
between  the  modules.   This  allows  the  various  modules  to  .develop  simulta- 
neously and  relatively  independent  of  each  other.   It  also  reduces  complex 
interactions  between  the  various  design  groups  which  speeds  up  implementation 
and  allows  easier  debugging. 
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Figure  1 


Also,  the  "micro -programming"  facility  of  the  Information  Manage- 
ment Machine  provides  the  capability  for  the  system  to  respond  to  individual 
users  so  that  user  dependent  optimization  of  searching  times  may  easily  be 
implemented. 

At  this  point,  a  brief  examination  of  the  glossary  of  terms  in 
Appendix  B,  page  21,  would  facilitate  further  reading. 

B.   Design  Goals 

The  rationale  of  any  information  management  system  can  only  be 
understood  in  the  light  of  the  type  of  data  to  be  handled  and  the  operations 
to  be  performed  on  the  data.   A  system  that  handles  only  one  data  base,  which 
expands,  but  never  changes  structure,  and  is  only  concerned  with  routine  re- 
trievals, can  be  designed  to  be  very  efficient  due  to  the  lack  of  generality 
required  by  the  users.   Since  the  designers  know  a  great  deal  about  this 
system,  many  short  cuts  may  be  taken;  however,  the  more  assumptions  made  and 
the  more  short  cuts  taken,  which  do  improve  system  speed,  the  greater  the 
chance  that  the  system  could  be  rendered  useless  under  production  stresses  - 
the  data  base  could  grow  too  large  or  queries  could  begin  to  get  so  complex 
that  serial  searches  are  constantly  performed  (serial  searches  are  very  slow 
on  serial  machines).   Since  the  system  was  "hard"  coded  to  take  advantage  of 
short  cuts,  changes  to  the  system  are  more  likely  to  require  the  entire  re- 
trieval and  data  structuring  subsystems  to  be  rewritten.   Therefore,  before 
suggesting  a  design,  we  must  define,  as  closely  as  possible,  the  classes  of 
data  to  be  manipulated,  the  types  of  processing  most  likely  to  be  desired, 
and  the  response  times  needed. 

Since  the  IMS  will  be  used  by  a  wide  variety  of  researchers  in  the 
social,  physical,  and  life  sciences,  it  is  apparent  that  the  system  must 
truly  be  a  generalized  information  management  and  modeling  system.   With  such 
diverse  users,  no  assumptions  can  be  made  about  the  operations  that  will  be 
performed  on  data  records,  files,  and  data  bases.   The  system  must  also  be 
amenable  to  changes — some  of  which  may  be  anticipated. 

It  is  apparent  that  this  system  will  be  a  query-  as  opposed  to  a 
transaction  system — that  is,  it  will  in  general  process  complex  requests 
instead  of  simple  data  item  retrievals.   It  is  more  likely  to  answer  queries 


of  the  form,  "is  there  a  significant  difference  between  blacks  and  whites  in 
attending  health  facilities  in  integrated  slums?"  rather  than  "What  was  the 
temperature  in  Champaign,  Illinois  on  December  1,  1970?" 

For  a  system  to  be  truly  a  research  tool,  it  must  provide  an  effec- 
tive computational  facility.   It  must  provide  statistical  and  other  modeling 
and  simulation  capabilities;  these  systems  must  be  integrated  into  the  over- 
all information  management  process.   This  is  to  avoid  having  researchers 
spend  months  manipulating  data  from  one  system  to  analyze  it  on  another. 

The  speed  of  such  a  system  is  important  if  interactive  graphic 
monitoring  becomes  desirable.   There  is  a  growing  need  to  be  able  to  create 
experimental  files  during  simulations  and  display  the  results  as  they  are 
found.   If  a  researcher  does  not  like  what  he  sees,  he  would  like  to  be  able 
to  call  a  file  which  maintains  a  program  of  the  model  or  the  input  data  and 
make  immediate  changes  and  rerun  the  model.   He  wants  to  think  about  models 
not  about  the  data  or  the  computer.   If  he  approves  of  what  is  developing  on 
the  terminal,  he  could  save  the  results  for  later  uses.   The  requirement  of 
speed,  which  is  ILLIAC  IV s  major  asset,  becomes  as  important  as  the  flexi- 
bility of  the  system. 

The  design  goals  are  listed  as  follows:   Terms  and  concepts  will  be 
explained  in  the  referenced  sections.   The  system  must: 

1)  Be  a  modular  system  which  may  be  easily  changed,  allows  incre- 
mental implementation,  and  expandability  as  experience  is  gairj 
A  subset  of  the  system  must  be  specified  which  can  be  imple- 
mented to  begin  processing  data  without  having  to  wait  for  a 
complete  system.   (Section  III). 

2)  Make  efficient  use  of  ILLIAC  IV s  parallelism.   (Section  III). 

3)  Provide  the  ability  to  dynamically  maintain  system  measure- 
ments which  will  be  used  to  suggest  system  "tuning"  to  improrl 
i. he  performance  of  the  system  or  measure  the  structural  char-| 
acteristics  of  different  data  bases  to  detect  inefficiencies! 

ber  and  size  of  C/0     rs  are  some  elementary  but.  ver, 
Influential  parameters  that  great i.v  affect  performance  of  btai 
systemst   (Append^   C). 


k)      Provide  the  ability  to  view  the  data  "base  record  structure  as 
a  tree  whose  nodes  correspond  to  data  items ,  "but  also  allow 
freely  generated  cross  links  between  different  nodes  of  the 
trees  in  different  records.   (Appendix  D). 

5)  Allow  the  user  to  easily  retrieve  records  by  selecting  any 
data  items  to  be  utilized  as  keys  upon  which  to  search  for  a 
subset  of  records  within  a  file.   (Appendix  D). 

6)  Provide  multiple  keyed  access  to  large  data  bases  that  would 

exist  in  arbitrarily  constructed  and  dynamically  formed  disk- 
ed     . 
sized  blocks  of  less  than  10  bits  (initial  system  size  of  the 

disk).   Each  variable  disk  block  would  be  made  up  of  several 

o 

files,  each  being  on  the  order  of  10  bits  in  size  (the  approx- 
imate size  of  a  tape  file).  (Appendix  D). 

7)  Provide  the  user  the  capability  of  using  alpha-numeric  names  as 
a  means  of  identifying  data  at  any  level.   (Appendix  D). 

8)  Provide  complete  flexibility  for  entering,  deleting,  structur- 
ing, and  naming  data  bases,  files,  records  and  data  items  that 
could  contain  information  in  any  processable  form.   (Appendix  D) 

9)  Provide  the  capability  to  protect  data  from  misuses  by  unautho- 
rized system  users.   (Appendix  D). 

10)  Provide  for  the  general  alteration  of  data  structures. 
(Appendix  D) . 

11)  Provide  conditional  control  in  the  query  programs.   This  will 
allow  different  actions  to  be  involved  depending  on  values  of 
retrieval  data  items  or  the  outcomes  of  the  computational 
routines.   (Appendix  E). 

12)  Provide  the  capability  to  use  the  system  through  one  uniform 
language.   (Appendix  E). 

13)  Provide  a  basic  and  expandable  source  of  mathematical  and 
statistical  routines  to  operate  on  researcher's  data  bases. 
(Section  III  and  Appendix  E). 


Ik)      Allow  other  systems  (e.g.,  the  Linear  Programming  System  or  the 
Statistical  System)  to  use  the  information  management  facilities 
of  the  IMS  for  temporary  file  maintenance,  system  program  over- 
laying, and  data  definitions.   (Appendix  E) . 

15)  Provide  comprehensive  report  generation  facilities. 
(Appendix  E). 

16)  Provide  the  capability  for  simulated  "time-shared  interactive" 
use.   (Appendix  E) . 


II.      ILLIAC  IV  Computer  System  Characteristics  that  Influence   Information 
Retrieval 


A.        Hardware 


ILLIAC    W 


Keyboard 


Card 
Reader 


Tapes 


ARPA 
Network 


■*->> 


B6500 


u    , 


5x10       bits/sec 


CU  '  Global     Control  And 
Processing 


Row  =  longword 


Col  =  IPEM 


Longword  / 
Addresses    » 


I012  bit 

Laser 

Memory 


^2047 


Local 
Processor 


PE63 


LONGWORD  =  64  Words 
(I  WORD/PE)  \ 


I  WORD   =64blts 


Local 
Memory 


PE  +  PEM  = 
Processing   Unit  = 
PU 


I09    bits/sec 


Ave.  Access   Rate 
20    m  Mil  sees 


ILLIAC   IV  Hardware  Relevant  to  Information  Retrieval 

Figure  2 


Figiire  2  is  a  sketch  of  relevant  ILLIAC  IV  hardware  to  aid  in  the 
discussion.   The  ILLIAC  IV  computer  can  process  6k   memory  blocks  (FEM's) 
simultaneously  by  executing  the  same  instruction  stream  (modified  locally) 
in  each  processor  (PE).  Each  FE  has  uninhibited  access  to  its  own  PE  memory 
(with  access  time  of  200  ns)  but  may  only  access  other  FE  memories  indirectly 
(with  access  times  ranging  from  300-900  ns)  by  loading  the  data  into  a  FE 
and  then  routing  it  to  the  demanding  FE.   Therefore ,  indirect  FE  addressing 
is  to  be  avoided,  if  possible.   Consequently,  it  appears  more  fruitful  to 
process  a  logical  block  of  data,  such  as  a  record,  within  one  FE  instead  of 
across  FE's.  Memory  is,  therefore,  viewed  as  6k   independent  data  streams 
composed  of  concatenated,  variable  length,  logical  blocks  of  data. 

Each  FE  is  about  twice  as  fast  as  a  CDC  6600  processor;  at  full 
efficiency  (i.e.,  all  FE's  working)  the  ILLIAC  IV  is  two  orders  of  magnitude 
faster  than  the  CDC  6600.   The  ILLIAC  IV  system  maintains  a  direct  access  10  - 

bit-head-per-track  disk  file  system  (expansion  facilities  could  provide  up  to 

9     \  9     / 

20  X  10  bits)  with  a  transfer  rate  of  10  bits/ sec  and  an  average  access 

time  of  20  ms.   The  disk  file  is  actually  composed  of  13  disks  called  Storage 

7  12 

Units  (SU)  -  each  contains  8  x  10  bits.   A  10  -bit  laser  memory,  accessible 

9 
by  file  names,  can  transfer  a  complete  ILLIAC  IV  disk  load  of  10  bits,  i.e., 

approximately  10  magnetic  tape  files,  in  2  minutes  or  a  tape  equivalent  in 

12  sec. 

The  ILLIAC  IV  processors  have  been  estimated  to  be  slightly  i/o 
bound.   Based  on  an  average  ILLIAC  IV  instruction  time,  the  processors  can 
each  execute  16  instructions  in  the  time  it  takes  a  load  1  long  word 
(6k   words  — 1  word/PEM) .  Although  I/O  bound,  as  are  most  of  the  large  scale 
computers  while  engaged  in  information  processing,  the  i/o  rate  is  extremely 
fast --about  ^00  times  faster  than  the  IBM  23lU  disk  system.   With  the  lCr 

bit/sec  transfer  rate  from  disk  to  core,  the  ILLIAC  IV  can  search  a  complete 

9 
disk  load,  i.e.,  10  bits  or  about  10  magnetic  tapes  full  of  data,  in  1  second. 

B.   Software 

A  most  influential  characteristic  of  the  i/O  operating  system  is 
that  i/o  proceeds  across  rows  with  a  minimum  transfer  of  16  words  (l/k   of  a 
longword).   This  means  that  to  access  single  records  on  the  disk  they  must 


"be  stored  into  memory  across  rows.   To  process  records  stored  across  rows 
presents  many  complex  programming  problems.   Thus,  another  approach  would  be 
to  transpose  each  record  from  across  rows  into  a  single  column  PE  memory. 
This  process  is  slow  due  to  routing  requirements.   However,  the  IMS  could 
preprocess  records  into  a  transposed  form  on  the  B6500  or  ILLIAC  IV  before 
they  enter  the  laser  memory.   This  approach  is  discussed  further  in  Section 
III  B  and  Appendix  C. 

C   Languages 

Presently,  the  ILLIAC  IV  system  is  supporting  two  languages,  ASK 
and  GLYPNIR.   ASK  is  an  assembly  language  with  macro  facilities  and  GLYPNIR 
is  an  ALGOL-like  list  processing  language.   Since  GLYPNIR  can  call  ASK  sub- 
routines, it  will  be  used  as  the  major  language  to  implement  the  IMS.   For 
routines  that  rely  on  the  most  efficient  processing  available,  assembler  sub- 
routines would  be  invoked. 

A  FORTRAN-like  language  has  been  specified,  and  is  in  the  implemen- 
tation stage.   It  will  possess  all  the  special  facilities  of  GLYPNIR. 
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III.   Specifications  for  the  ILUAC  IV  Information  Management  System 
A.    Introduction 

The  functions  of  the  Information  Management  System  (IMS)  may  "be 
separated  into  three  areas:   file  handling,  computational  facilities,  and 
query  analysis  with  report  generation.   File  handling  capabilities  are  pro- 
vided "by  the  Data  Management  System  (DMS)  in  conjunction  with  the  Symbolic 
Data  Structuring  System  (SDSS).   The  Mathematical  Computation  System  (MCS) 
provides  an  extensive  source  of  mathematical  and  statistical  routines  that 
can  operate  on  specific  or  random  samples  of  data  items  from  files.   Incoming 
queries  are  analyzed  by  the  IRS,  which  controls  the  actions  of  the  other  sub- 
systems and  causes  reports  to  be  generated  to  answer  the  queries. 

The  functions  of  each  of  the  modules  will  be  discussed  in  the 
following  sections.   The  emphasis  has  been  placed  on  specifying  their  respon- 
sibilities.  Suggestions  for  each  module's  design  will  be  found  in  Appendices 
C,  D,  and  E.   Research  is  required  to  determine  the  performance  of  these  de- 
signs and  whether  new  approaches  should  be  proposed. 


B.   The  Data  Management  System  (DMS) 

Present  information  management  systems  spend  most  of  their  time  in 
bookkeeping  overhead.   Systems  that  do  not  provide  direct  access  facilities 
spend  large  amounts  of  processing  time  in  sorting,  matching  and  merging  files', 
before  any  productive  processing  can  be  done.   This  is  a  result  of  having  to  ! 
use  magnetic  tapes  (a  tape  is  usually  larger  than  most  disks)  for  bulk  storag^ 
There  are  systems  that  provide  some  direct  access  facilities  on  subpages  of 
files  that  have  been  placed  on  disks ;   however,  these  systems  are  plagued  by 
extensive  tables  that  point  to  each  record  (e.g.,  the  TDMS  system  developed 
by  System  Development  Corporation).   These  systems  are  clogged  by  the  queue-  > 
ing  of  I/O  requests.   This  is  especially  true  in  time-sharing  facilities  whe* 
many  data  sets  of  widely  varying  characteristics  must  be  kept  and  maintained! 
for  variable  lengths  of  time.   Also,  many  systems  are  forced  to  have  compli-| 
cated  and  costly  garbage  collection  penalties.   As  a  system  grows,  the  amount 
of  code  generated  by  these  overhead  routines  may  easily  reach  the  critical 
point  where  it  becomes  necessary  to  overlay  system  programs.  For  many  systel 
this  paging  becomes  so  costly  that  the  system  must  be  rewritten. 
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To  avoid  these  problems,  we  would  like  to  create  a  simple  data 
management  system  (DMS)  which  has  a  high  access  rate  to  data  and  is  desi 
to  minimize  queueing  problems.   The  proposal  of  such  a  design  and  its  details 
are  given  in  Appendix  C. 

Ihe  DMS  provides  the  lowest  level  support  for  the  Information  Man- 
agement System.   It  has  been  designed  to  provide  an  easily  useable  general 
purpose  file  acquisition  system  to  bring  files  from  the  laser  memory  to  the 
ILLIAC  IV  disk,   farther  retrieval  structures  called  key  elements  provide 
search  arguments  which  are  matched  against  other  key  elements  in  order  to 
retrieve  a  subset  of  records  from  a  file  residing  on  the  disk.   These  records 
are  then  passed  to  the  SDSS  in  ILLIAC  IV  memory  for  analysis. 

A  separate  module  is  available  in  the  DMS  to  transfer  card  and  tape 
files  to  the  laser  memory  and  provide  any  necessary  reformatting  so  that  these 
files  can  be  directly  fed  to  disk  memory  upon  request.   File  reformatting  is 
required  by  the  DMS  because  it  is  designed  to  handle  a  very  simple  form  of 
data  structure  called  the  information  element.   The  reformatting  process  need 
only  be  done  once;  restrictions  are  placed  on  the  flexibility  of  the  file 
structure. 

There  are  three  types  of  information  elements:   key,  record,  and 
hole  elements.   Key  elements  contain  data  items  upon  which  searches  can  be 
made  to  find  collections  of  related  data  items.   A  collection  of  related  data 
item  values  is  called  a  record.   Records  are  maintained  in  record  elements. 
Whenever  a  record  is  deleted,  a  space  is  left  in  the  data  stream.   This  space 
is  identified  by  a  hole  element.   The  DMS  will  provide  a  complete  set  of  oper- 
ations to  form,  maintain,  or  delete  these  elements. 

The  system  may  periodically  scan  the  disk  to  consolidate  all  the 
hole  elements  whenever  efficient  performance  is  threatened  by  not  having 
available  hole  elements  that  are  large  enough  to  accept  new  records.   This 
will  be  handled  by  separate  routines  that  will  be  invoked,  unknowingly  by 
users,  whenever  the  performance  monitor  senses  the  need. 

The  performance  monitor  is  a  module  that  maintains  all  the  DMS 
parameters  and  can  suggest  changes  to  designers  or  users  when  the  system  en- 
counters inefficiencies  due  to  unusual  data  base  characteristics  or  if  the 
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original  design  was  faulty.   These  parameters  could  Toe  the  number  of  i/o 
"buffers  and  buffer  sizes. 

The  DMS  will  receive  requests  from  the  SDSS  and  must  translate  their 
into  a  sequence  of  simple  DMS  commands.   The  DMS  will  notify  the  SDSS  when 
the  commands  have  been  finished. 

C.   The  Symbolic  Data  Structuring  Systems  (SPSS) 

To  utilize  the  DMS  directly  as  a  comprehensive  information  retrieva 
system  would  force  the  user  to  code  and  decode  all  of  his  file  names  and  data 
structures.   This  is  because  no  system  exists  to  handle  record  element  decod- 
ing.  A  user  would  have  to  know  all  the  details  of  his  data  and  how  it  re- 
lated to  other  data.   Therefore,  the  DMS  is  not  a  significant  tool  by  itself 
but  requires  a  second  level  of  software  support,  the  SDSS,  which  treats  the 
DMS  as  an  interface  to  the  ILLIAC  IV  Operating  System.   The  responsibility 
of  the  DMS  will  be  to  maintain  the  gross  aspects  of  information  management 
such  as  file-record  maintenance,  retrieval  and  dissemination.   It  will  be  the 
responsibility  of  the  SDSS  to  record  in  a  set  of  tables,  called  Symbol  Tables 
the  particular  structuring  of  all  files,  records  and  data  elements  throughout 
the  system.   The  details  for  a  suggested  design  of  the  SDSS  may  "be  found  in 
Appendix  D. 

The  SDSS  will  be  designed  to  provide  any.  level  of  accessing  to  a 
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file  contained  within  the  system.  It  uses  the  DMS  to  perform  its  retrieval 
operations.  It  will  give  the  user  the  ability  to  locate  an  arbitrary  field 
or  subfield,  i.e.,  a  location  within  a  record  where  data  items  are  stored,  in; 
order  to  locate  a  data  itemfe  value  and  provide  that  value  to  the  user  in 
ILLIAC  IV  core.  It  also  recognizes  the  subfields  of  keys  and  can  enter  the 
identification  file  codes  (a  code  which  identifies  the  file  which  the  asso- 
ciated record  belongs  to  at  the  beginning  of  a  key). 

The  SDSS  will  use  a  set  of  Symbol  Tables  to  define  the  tree  struc-j 
ture  of  data  items  in  records.   There  exists  one  Symbol  Table  that  defines 
the  structure  of  all  the  other  Tables.   This  is  called  the  IMS  Symbol  Table. 1 1 
Each  class  of  users  for  each  data  base,  will  have  a  Symbol  Table  that  is  con] 
sidered  to  be  part  of  the  IMS  Symbol  Table.   The  first  entries  in  the  tables 
are  a  list  of  alpha-numeric  names  for  all  the  different  data  items  available: 
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in  a  data  base.   By  maintaining  information,  such  as,  what  level  in  the  tree 
1  a  data  item  resides,  the  data  item's  parent  and  sisters,  whether  or  not  it 
is  a  key,  etc. ,  the  structure  of  each  file  is  uniquely  defined.   Whenever  the 
DMS  retrieves  a  record  to  be  examined,  the  SDSS  looks  at  the  keys  to  determine 
to  which  file  this  record  belongs.   Data  may  be  accessed  at  any  level  by  fol- 
lowing pointers  through  the  Symbol  Table. 

The  SDSS  will  note  any  changes  made  to  the  entries  in  the  Symbol 
Table  and  will  invoke  the  proper  routines  to  restructure  the  files  containing 
these  entries.   Once  this  scheme  has  been  debugged,  complicated  file  dependent 
routines  cannot  cause  information  losses  due  to  incorrect  file  recopying. 

The  attributes  of  all  data  items  will  also  be  specified  in  the 
Symbol  Tables.   Any  system  designed  to  handle  arbitrary  data  bases  must  be 
able  to  process  data  in  any  form,  e.g.,  vectors,  matrices,  symbolic  coding, 
English  text,  etc.   To  allow  complete  flexibility  for  cross-referencing  data, 
the  data  type  "relational  link"  will  be  utilized.   It  is  essentially  a  pointer 
that  identifies  a  relation  held  between  two  nodes  of  the  same  or  different 
data  item  trees.   Algorithms  will  be  developed  to  take  advantage  of  such  link- 
ing schemes  to  reduce  the  times  to  process  certain  queries.   Appendix  D.3 
describes  a  proposed  design  for  the  Symbol  Tables. 

Any  data  items  that  require  further  semantic  definitions  will  have 
pointers  to  a  descriptor  file.   The  descriptor  file  maintains  several  levels 
of  English  text  that  may  be  used  to  explain  the  meaning  and  uses  of  an  item 
in  a  data  base. 

D.   The  Information  Retrieval  System  (IRS) 

The  previous  two  subsystems,  the  DMS  and  the  SDSS,  have  been  de- 
signed to  isolate  the  housekeeping  functions  of  information  retrieval  from 
the  users  and  designers  of  the  IMS.   These  systems  will  provide  languages 
to  control  their  functions.   The  function  of  the  DMS  will  be  to  supply  in 
core  to  the  SDSS  a  set  of  qualifying  records  for  a  request.   The  SDSS  will 
then  supply  the  specific  data  items  in  question  to  the  IRS.   It  will  be  the 
function  of  the  IRS  to  translate  user  queries,  call  upon  the  required  math- 
ematical and  statistical  routines  to  operate  on  the  retrieved  data,  and  answer 
the  queries  in  various  report  forms.   The  details  for  an  initial  approach  to 
the  system  design  are  found  in  Appendix  E. 
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A  general  search  and  control  language  will  be  provided.   The  IRS 
will  interpret  the  instruction  stream  and  invoke  the  SDSS  and  Mathematical 
Computation  System  when  necessary.   The  SDSS  must  be  requested  to  save  any 
intermediate  files  generated  through  the  interactive  mode.   This  is  necessary 
because  of  the  simplified  nature  of  the  first  version  of  the  operating  system. 
Appendix  E  contains  further  details. 

E.   The  Mathematical  Computation  System  (MCS) 

The  IMS  will  allow  researchers  to  retrieve  data  and,  without  addi- 
tional intermediate  steps,  call  mathematical  and  statistical  computation 
routines  to  perform  analyses  on  the  data.   The  Mathematical  Computation  Systei 
(MCS)  is  made  up  of  several  subsystems:   the  Statistical  System,  the  Linear 
Programming  System,  and  the  Modeling  and  Simulation  System.   Appendix  E 
suggests  an  overall  interface  design  between  these  systems  which  would  allow 
their  development  to  proceed  independently  but  which  will  allow  easy 
integration  into  the  IMS  so  that  they  may  form  a  highly  interactive  system. 

F.   Using  the  Information  Management  System 

The  ease  with  which  researchers  could  use  the  IMS  should  be  empha- 
sized.  A  hypothetical  example  will  be  used  to  demonstrate  this  system's 

ability  to  be  used  for  research. 

i 
Suppose  that  an  urban  sociologist  has  recorded  data  on  several 

hundred  variables  relating  to  the  social  and  economic  factors  of  a  large  citj 
Perhaps  he  wishes  to  create  a  model  which  predicts  the  growth  of  crime  as  th«j 
city  expands.  He  may  begin  his  study  by  reading  his  data  into  the  IMS.  Witl 
the  aid  of  a  data  manager,  he  would  determine  an  efficient  way  to  structure 
the  data  based  on  his  proposed  research  requirements.  Later  he  may  want  to 
change  this  structure.  The  change  could  easily  be  implemented  by  a  few  shorl 
commands  to  the  IMS. 

Once  his  data  is  stored  he  may  wish  to  perform  a  correlation  study 
to  determine  the  most  promising  predictors.  He  would  produce  a  set  of  in- 
structions directing  the  IMS  to  retrieve  all  data  to  be  studied  and  to  form 
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a  new  file  in  the  form  of  a  matrix.   The  command  to  perform  a  correlation 
analysis  and  output  the  results  on  his  remote  terminal  would  be  in  the  same 
instruction  stream.   Based  on  the  results,  he  could  select  the  data  associated 
with  the  variable  he  would  like  to  study  further  and  reduce  the  original 
matrix  to  a  smaller  one  as  input  to  a  regression  program.   Based  on  the  equa- 
tion returned  and  the  standard  errors,  the  sociologist  may  select  several 
variables  to  be  multiplied  together  to  create  a  nonlinear  model.   Commands 
to  the  system  will  combine  these  observations  and  resubmit  the  new  matrix  to 
the  regression  program. 

If  an  equation  is  found  to  his  liking,  he  may  wish  to  randomly 
generate  data  to  test  out  his  model.   All  intermediate  generated  files  could 
be  saved  for  future  comparisons. 

It  should  be  noted  that  this  entire  exploration  could  have  been 
done  with  one  set  of  instructions  utilizing  the  conditional  branching  commands. 
Complex  operations  like  these  could  be  accomplished  in  a  short  turn-around 
time  because  of  the  computational  speed  of  the  IT.hTAC  IV.   Research  could  pro- 
ceed at  an  accelerated  rate  and  the  researcher  would  not  have  to  worry  about 
the  details  of  data  handling  and  computer  methodology. 
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IV.   List  of  Tasks 

The  following  list  is  a  brief  outline  of  the  tasks  to  be  performed 
to  implement  the  IMS.   Once  task  (3)  has  been  completed,  we  will  have  a  basic 
information  retrieval  system  which  can  begin  processing  data,  although  the 
full  power  of  the  system  lies  in  the  completion  of  the  IRS  and  the  MCS. 

1)  Study  and  simulate  methods  of  accessing  data  through  ILLIAC  IV, 

2)  Define  the  DMS  and  SDSS  according  to  the  findings  of  (l). 

3)  Implement  the  DMS  and  SDSS. 


1,  2,   and  3  provide  the  first  level  of  implementation  for  the  MS. 


k)  Define  the  IRS. 

5)  Concurrently  define  the  MCS. 

6)  Implement  the  IRS  and  MCS  as  one  system  (i.e.,  the  IMS) 

7)  Test  and  evaluate  the  IMS  on  an  existing  data  base. 
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I   ^g  Use  of  an  Existing  Data  Base  for  Testing  and  Evaluating  the  IMS 

A.   Introduction 

Before  establishing  a  production  mode,  the  system  should  be  thor- 
oughly debugged  and  evaluated.  Most  systems  will  use  or  simulate  some  sub- 
set of  a  data  base  to  test  the  system.   Under  such  contrived  situations,  it 
is  impossible  to  actually  determine  the  areas  of  the  system  that  need  more 
work  until  real  data  bases  are  used.   This  is  highly  undesirable  to  the  user 
and  causes  anxiety  for  the  system  designers. 

For  these  reasons,  an  existing  production  data  base  has  been  sought. 
Dr.  R.  Cancro  of  the  University  of  Connecticut,  in  cooperation  with  Dr.  E.  laska 
and  Dr  E.  W.  Logeman  of  the  Information  Science  Division  Research  Center  of 
the  Rockland  State  Hospital,  has  secured  permission  to  transfer  the  current 
state  of  the  Rockland  data  base  to  the  ILITAC  IV  computer.   This  will  provxde 
a  practical  test  of  the  design  and  implementation  of  the  system  as  well  as 
providing  a  valuable  service  to  mental  health  researchers. 

B.   A  Description  of  the  Rockland  State  Hospital  Data  Base 

The  Rockland  State  Hospital  data  base  is  a  file  of  information  on 
the  admittance  and  treatment  of  mental  patients  of  six  states  in  the  north- 
eastern U.S.   This  data  base  currently  contains  records  on  120,000  patients. 
There  are  between  500  and  1,000  records  of  new  admissions  added  to  this  data 
base  every  week.   The  data  is  a  two-file  data  base  with  the  master  file  being 
the  patient  record  filed  by  location.   The  other,  auxiliary  file,  is  a  file 
keyed  only  by  patient  number  which  is  used  to  find  the  current  location  of  a 
patient-either  in  one  of  the  buildings  of  one  of  the  hospitals  or  at  his 
home.   There  are  many  different  information  segments  within  the  master  file. 
Some  of  these  are  the  admission  segments,  the  treatment  segments,  drug  treat- 
ment and  response  segments,  transfer  segments,  termination  segments,  and 
psychiatric  evaluation  segments.   The  basic  structure  of  the  file  also  changes 
periodically  with  the  addition  of  new  segments. 

There  are,  on  the  average,  approximately  500  characters  per  patient 
record  in  this  file.   This  produces  an  active  patient  data- base  of  approxi- 
mately  5  X  108  bits.   This  is  50*  of  the  capacity  of  the  ILHIAC  IV  disk. 
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This  data  base  is  expected  to  expand  by  a  factor  of  10  over  the  next  five 
years.   The  present  Rockland  Information  System  is  not  capable  of  searching 
the  entire  data  base  for  research  purposes.   With  the  complexity  and  rich- 
ness of  the  patient  file,  it  is  clear  that  there  is  a  considerable  amount 
of  valuable  information  buried  in  this  data  base  on  the  effectiveness  of 
various  drug  treatments  on  different  diseases  and  the  effectiveness  of  other 
treatments  on  mental  illnesses.   It  is  desirable  for  researchers  to  be  able 
to  access  this  data  base  in  a  highly  general  way  for  the  purposes  of  study- 
ing treatments.   Currently,  it  is  prohibitively  expensive  to  determine  the 
effects  of  drugs  on  a  patient  population  over  a  short  period  of  time.   The 
only  way  in  which  this  can  be  done  is  to  take  a  small  sample  of  patients  and 
trace  their  histories.   With  these  small  samples,  there  are  certain  variatior 
which  simply  cannot  be  noted.   This  is  because  the  current  Rockland  System 
does  not  have  direct  access  to  the  entire  patient  population.   For  example, 
if  there  is  a  particular  mental  disease  which  responds  particularly  well  to 
applications  of  special  drugs,  unless  one  asks  this  question  in  advance  of 
structuring  the  data,  this  piece  of  information  would  be  too  costly  to  obtaii 

The  ILTiIAC  IV  Information  Management  System  would  allow  data  bases 
on  the  order  of  the  Rockland  State  base  to  be  extensively  explored  by  pro- 
fessional researchers.   This  would  allow  doctors  and  other  professionals  to 
study  the  data  base  on  the  basis  of  "hunches";  that  is,  they  could  ask  a 
single  question  about  a  given  drug  across  a  whole  population  of  120,000 
patients.   If  they  suspected  that  the  drug  has  a  particular  set  of  charac- 
teristics  and  if  the  response  indicated  that  their  hunch  was  true,  but  might 
be  true  for  a  particular  mental  illness  only,  they  could  then  follow  this  up- 
with  additional  questions  and  find  out  exactly  what  the  effect  of  this  drug  , 
was  on  a  particular  patient  population.   They  could  also  make  tests  which 
would  differentiate  between  the  effectiveness  of  the  drugs  between  different 
treatment  patterns,  such  as  given  in  the  morning  or  night.   This  type  of 
research  must  generally  be  done  on  an  evolutionary  basis.   A  professional 
man's  hunch  in  this  case,  if  it  can  be  followed  up  with  a  reasonable  set  of 
questions  which  can  be  formulated  and  answered  in  a  fairly  short  amount  of 
time,  greatly  enhances  the  ability  to  extract  knowledge  from  these  large - 
ale  data  bases. 
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A  current  problem  in  the  research  area  is  that  the  form  of  the 
question  has  to  be  highly  specific  and  coded  in  a  machine -dependent  way. 
The  researcher  is  forced  to  learn  in  detail  the  nature  of  the  particular 
machine  he  is  working  with.   He  is  also  exposed  to  a  tremendous  amount  of 
frustration  when  errors  are  made  in  coding  the  problem  for  the  machine. 
Usually,  only  the  most  mundane  questions  may  be  asked  and  generally  the 
studies  require  months  to  perform. 
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Appendix  B:  A  Glossary  of  Terms 

The  following  is  a  list  of  terms  and  their  definitions  which  have 
been  referred  to  in  the  course  of  this  document.   The  list  has  been  inserted 
because  the  uses  of  many  data  management  terms  are  not  standardized  and  they 
may  be  used  here  in  a  different  sense  than  that  in  which  the  reviewer  is 
accustomed. 


IMS 
DMS 
SDSS 

IRS 

MCS 

elementary  data  item 

group  data  item 

data  item 

record 

key 

value 

name 
file 

data  base 
hole  element 

key  element 
record  element 
data  element 
sub field 


Information  Management  System 

Data  Management  System 

Symbolic  Data  Structuring  System 

Information  Retrieval  System 

Mathematical  Computation  System 

a  name  of  an  atomic  piece  of  data 

a  name  associated  with  several  closely 
related  elementary  data  item  or  sev- 
eral other  group  data  items 

an  elementary  or  group  data  item 

any  group  of  related  data  items 

any  data  item  which  is  used  to  identify 
a  record 

the  actual  data  associated  with  a  data 
item,  i.e.,  characters,  numbers,  logi- 
cal values 

alpha-numeric  string  of  characters 

all  the  records  that  correspond  to  a 
single  set  of  associated  keys 

any  set  of  files 

a  bit  string  that  contains  no  informa- 
tion, sometimes  used  by  the  DMS  where 
data  elements  have  been  deleted 

a  bit  string  that  contains  a  key,  used 

by  the  DMS 

a  bit  string  that  contains  a  record, 
used  by  the  DMS 

several  key  elements  and  the  asso- 
ciated record  element 

refers  to  the  actual  bit  location  of 
an  elementary  data  item  value  in  a 
record  element 
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field 

Information  element 
garbage  collection 

Symbol  Table 

query 

elementary  question 


key  mask 
i/O  buffer 

processing  buffer 
characteristic 

SCL 
format 


a  collection  of  subfields  or  other 

fields 

a  hole,  key,  or  record  element 

the  grouping  of  smaller  hole  elements 

into  larger  ones 

a  table  that  defines  the  structure  of 

data  items  within  records  of  a  user's 

data  base 

any  sequence  of  instructions  to  the 

IMS 

given  the  value  of  a  data  item,  which 

identifies  a  subset  of  records  of  a^ 

file,  what  are  the  values  of  an  arbi- 
trary subset  of  data  items  from  the 

identified  records? 

defines  the  relationships  necessary  to 

fulfill  a  match  for  a  key 

one  of  a  set  of  cyclic  buffers  in  core 

used  to  receive  incoming  streams  of 

data 

a  buffer  used  to  save  records  as  they 

are  found  during  a  general  search 

a  name  associated  with  a  specific 
boolean  combination  of  data  items 
and  other  characteristics  whose  value; 
are  restricted  by  some  relation 

the  Search  and  Control  Lanaguage  for 
the  IMS 

the  information  that  defines  the 
structure  of  .data  element  fields 
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Appendix  C:   A  Suggested  Design  for  the  Data  Management  System  (DMS) 

C.l   Responsibilities  of  the  DMS 

The  DMS  provides  the  lowest  level  support  for  the  IMS.   The  follow- 
ing is  a  list  of  responsibilities  assumed  by  the  DMS. 

1)  It  must  have  a  preprocessing  module  to  transfer  card, 
tape,  and  other  peripheral  files  to  the  laser  memory. 

2)  It  must  transfer  requested  files  between  the  laser 
memory  and  the  disk  and  requested  records  between  the 
disk  and  ILLIAC  IV  memory. 

3)  It  must  provide  the  capability  of  creating  and  inter- 
preting key  elements  from  specified  key  formats,  and 
record  elements  from  specified  record  formats,  and 
associate  them  to  create  data  elements  for  disk  and 
laser  dissemination. 

h)      It  must  provide  the  necessary  set  of  operations  to  be 
performed  on  record,  key,  and  hole  elements,  files, 
and  data  bases,  e.g.,  reformatting,  deletion,  Inser- 
tion, searching. 

5)  It  must  provide  automatic  disk  maintenance,  i.e., 
hole  element  collection  that  is  periodically  invoked 
by  the  performance  monitor  module.   (The  user  has 

no  need  to  know  of  the  existence  of  such  a  routine . ) 

6)  It  must  provide  a  simple  language  to  specify  the 
operations  to  be  performed  by  the  DMS. 


2k 


C.2   The  DMS  Information  Elements 

To  obtain  efficiency  and  ease  of  data  handling,  the  DMS  requires 
that  all  of  the  data  it  processes  exist  in  a  highly  simplified  form.   It  will 
"be  apparent  that  any  data  structure  can  be  represented  this  way.   An  infor- 
mation element  consists  of  a  bit  string  in  which  there  are  two  fixed  length 
fields  and  one  variable  length  field.   The  first  field  is  a  two-bit  indica- 
tor of  the  type  of  information  that  exists  in  the  third  field.   The  second 
field  specifies  the  length  of  the  variable  length  third  field.   To  the  DMS, 
the  third  field  is  an  arbitrary  bit  pattern.   Figures  3  and  k   are  diagrams 
of  Information  elements. 


BIT  STRING 

0 

1 

2 

3 

. 

k 

k+1 

• 

•          • 

n 

f  i  r  1  ri    °          — 

.     f  i  r  1  c\    ? 

1 field  1 

field  1:   two  indicator  bits;  specifying  to  DMS  the  3  types  of  information 
elements,  see  Figure  h. 

field  2:  binary  integer  indication  of  number  of  bits  in  field  3 

field  3'.   arbitrary  bit  pattern 


Diagram  of  a  DMS  Information  Element 
Figure  3 


1.   hole  elements 


0 

0 

Length 

Bit  pattern 

The  hit  pattern  contains  no  information  and  is, 
therefore,  called  garbage.   It  indicates  that  this 
information  element  is  available  for  key  or  record 
elements. 


2.   key  elements 


1 

0 

Length 

Bit  pattern 

This  pattern  represents  a  key  and  its  value.   A  key 
is  any  data  item  used  to  identify  a  class  of  related 
data  items  called  a  record. 


3.   record  elements 


1 

1 

Length 

Bit  pattern 

This  bit  pattern  represents  a  record  which  is  a 
collection  of  structured  data  items. 
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The   Three   Types  of  Information  Elements 
Figure   k 
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Several  key  elements  and  a  single  related  record  element  make 
up  a  single  data  element.   Since  a  data  element  is  composed  of  an  arbitrary 
number  of  concatenated  key  elements,  there  exists  a  multiple  keying  poten- 
tial for  accessing  any  record. 

The  "bit  patterns  in  data  elements  are  interpreted  to  be  fields 
of  data  item  values  by  the  Symbolic  Data  Structuring  System  (SDSS)  de- 
scribed in  Appendix  D.   There  are  special  fields  in  a  key  element's  third 
field  that  identify,  through  a  table,  the  key  and  record  formats  for  the 
associated  data  element. 

The  syntax  that  governs  the  occurrence  of  hole,  key,  and  data 
elements  on  the  disk  is  shown  in  Figure  5« 
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<global  bit   string>   ::=  <data  base   segment>  <">...<">,   there  will  be  6  of 

these 
<data  base   segment>   ::=  <quadrant  bit   string>. . . ,<">,   there  will  be  h  of 

these 
<quadrant  bit   string>   ::=  <parallel  bit   string>  <">...<">,   there  will  be  6U 

of  these 
<parallel  bit   string>   ::  =  <parallel  bit   string>(<hole   string>/<data  element>)/NULL 


<hole  string> 
<hole  eleraent> 
<data  element> 


=  <hole  string>  <hole  element>/MJLL 
:=  <ZERO  BIT>  <ZERO  BIT>  <length>  <bit  pattern> 
:=  <key  string>  <record  element> 
<key  string>  :  :  =  <k.ej   string>  <key  element>/<key  element> 
<key  element>  ::=  <ONE  BIT>  <ZERO  BIT>  <length>  <bit  pattern> 
<record  element>  : :=  <0NE  BIT>  <ONE  BIT>  <length>  <bit  pattern> 
<length>  ::=  <bit  pattern>,  fixed  length  string  that  represents  a  positive 

binary  integer 
<bit  pattern>  ::=  <bit  pattern>  (<ZERO  BIT>/<OKE  BIT>)/MJLL 

The  Syntax  that  Governs  the  Occurrence 
of  Hole,  Key,  and  Record  Elements 

Figure  5 

9 
This  syntax  implies  that  the  10  -bit  disk  will  be  broken  into 

6  segments  (2  SU's  per  segment,  see  Section  II. A),  each  of  which  may  con- 
tain one  or  more  files  from  one  data  base --depending,  on  the  size  of  the  files. 
Each  of  these  segments  is  further  broken  into  four  starting  locations  where 
referencing  can  begin. 

It  is  the  physical  association  of  keys  and  records  to  make  data 

elements  residing  on  the  disk,  along  with  a  total  disregard  of  actual  record 

addresses,  that  eliminates  large  amounts  of  overhead  experienced  by  other 

systems.  Yet  there  is  an  average  retrieval  time  of  only  1/2  second  to  find 

an  arbitrary  record  among  a  10  -bit  data  base  or  just  l/20  second  on  the 

o 

average  to  find  an  arbitrary  record  among  a  10  -bit  file  (about  the  size  of 
a  tape). 
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C3   Data  Management  on  the  Information  Elements 

C.3.1   Specifying  the  i/o  Scheme  to  the  i/o  Subsystem  of  the  Operating 
System 

To  specify  an  i/o  scheme  to  the  operating  system,  the  data  base 
segments  (see  syntax  in  Figure  5  of  Section  C.2)  are  broken  at  equal-length 
points  to  form  disk  blocks.   Each  block  has  been  declared  as  a  fixed-length 
record  to  the  ILLIAC  IV  operating  system  and,  as  such,  it  is  addressable. 
Each  data  base  segment  is  declared  as  a  fixed-length  file;  therefore,  it  is 
addressable  by  name.   The  DMS  will  issue  an  almost  continuous  sequence  of  ii 
requests  in  order  to  stream  hole  and  data  elements  of  a  data  base  segment  ini 
jt.t.tah  IV  PE  memories   [one  parallel  bit  string  (see  syntax  in  Figure  5  of 
Section  C.2)  in  each  FE  memory]  to  be  processed. 

For  simplicity,  all  hole,  key,  and  record  elements  could  be  arrang 
to  begin  on  word  boundaries  and  exist  in  multiple  word  lengths.  However,  th 
information  element  lengths  will  still  be  specified  in  bits  to  determine  the 
garbage  bits  in  the  last  word  of  each  element. 

Figure  6  shows  a  typical  information  element  distribution  within  i 
buffer  after  it  has  been  read  into  ILLIAC  IV  memory.   ILLIAC  IV  buffers  cor 
respond  to  disk  blocks.   The  preprocessing  module  could  reformat  all  incomj 
files  before  they  are  entered  on  the  laser  memory.   This  would  avoid  any  r< 
to  column  transpositions  by  the  ILLIAC  IV  which  is  time  consuming,  as  state< 
in  II. B.   It  must  be  understood  that  a  record's  data  no  longer  remains  con-; 
tiguous  on  the  disk  as  in  other  schemes.   This  is  because  records  are  writ! 
"down"  FE's  while  i/o  proceeds  across  rows.   Records,  therefore,  can  only  1 
referenced  by  the  buffers  in  which  they  reside  and  entire  buffers  must  be 
read  to  find  a  particular  record  (also,  63  other  parallel  bit  strings  are 
brought  in). 

Blocks  of  rows  (i.e.,  longwords)  of  ILLIAC  IV  memory  are  treated 
a  set  of  I/O  buffers  which  accept  the  quadrant  bit  string  buffers.   This  me 
that  the  operating  system  reads  or  writes  one  i/O  buffer  while  the  DMS  is 
ceasing  other  i/O  previously  filled  buffers. 

This  system  is  penalized  by  disk  latency  as  well  as  the  fact  tha 
all  searches  must  be  started  at  the  beginning  of  a  parallel  bit  string.  W 
must  have  a  known  starting  point  to  pick  up  the  system  elements  where  ther 
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longwords 
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i+3 
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ANOTHER  BUFFER 
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Parallel  Bit  Strings 


ONE  BUFFER 
LOAD 
(m  longwords) 


A  Typical  Information  Element  Distribution  within  a  Buffer 

Figure  6 
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is  a  beginning  of  a  hole,  key,  or  record  element  in  each  parallel  bit  string; 
otherwise,  we  cannot  tell  what  the  bits  represent.   Thus,  there  would  be  an 
average  disk  latency  of  1/2  rotation  to  get  to  a  starting  point.   The  latency 
is  reduced  by  introducing  four  starting  points,  i.e.,  quadrant  bit  strings, 
which  divides  each  SU  of  the  disk  into  four  sections.   The  expected  starting 
latency  is  now  only  l/8  rotation  (5  ms.).   Only  buffers  that  begin  on  quad- 
rant boundaries  will  have  the  beginning  of  an  information  element  in  each  word 
of  the  first  longword.   An  example  of  such  a  buffer  follows  in  Figure  7- 


longword  1 


m 


(0  <  3  <  63) 


[key 


hole  or 

record 

element] 


A  Beginning  Quadrant 
Figure  7 
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One  Buffer  -  at  the 
beginning  of  a 
quadrant 


The  end  of  a  quadrant  must  have  at  least  k  bits,  (unless  an  element  runs  to 
the  end  of  the  last  buffer)  see  Figure  3;  in  each  parallel  bit  string  in  order 
to  declare  a  length  hole.   We  cannot  "write"  information  elements  across  quad- 
rant boundaries. 

i/o  status  is  to  be  synchronized  by  the  control  unit  (CU)  via  the 
i/o  monitor.  If  i/o  buffer  processing  exceeds  a  critical  time,  i/o  must  be 
stopped  -  an  entire  disk  revolution  (i+0  millisec.)  is  lost  until  processing 
can  begin  at  the  same  point.  Such  systems  parameters  as  buffer  size,  buffer 
phasing  on  the  disk,  and  the  number  of  processable  elementary  questions  per 
batch  could  be  varied  according  to  the  particular  query  or  user  class  in  order 
to  avoid  i/o  catastrophic s,  i.e.,  stopping  i/o. 

Each  FE  maintains  the  status  of  i/o  buffer  processing,  i.e.,  current 
bit  counts,  information  element  type,  keys  on  which  the  search  is  being  con- 
ducted, and  status  on  key  comparisons  being  made.   Searches  and  comparisons 
are  executed  on  groups  of  6k   or  less  bits  (word  size)  at  a  time. 
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C.3-2  Using  the  DMS 

All  requests  processed  by  the  DMS  Involve  keys  and  files.   These  are 
specified  by  name  to  the  SDSS  which  must  check  a  Symbol  Table  to  determine 
protection  requirements,  validity,  structure,  and  location.   If  the  location 
of  a  key  is  determined  to  be  within  a  file  that  is  not  presently  on  the  disk, 
the  off-disk  file  maintenance  module  must  locate  it  in  the  laser  memory  and 
pass  it  to  the  disk.   This  assumes  that  the  files  are  formatted  so  that  they 
can  be  directly  fed  into  holed-out  (available)  buffers  on  the  disk.   It  may  be 
necessary  to  collect  garbage  within  a  data  base  segment  before  such  a  file  re- 
quest is  made.   This  would  create  room  for  the  file.   New  files  should  be  pre- 
processed  so  that  they  will  be  compatible  with  the  rest  of  the  DMS  file  struc- 
tures before  being  entered  on  the  laser  memory.   This  could  be  done  by  B6500 
routines  so  that  the  ILLIAC  IV  processors  are  not  wasting  valuable  time  with 
mundane  housekeeping  -  although  some  cannot  be  avoided. 

Another  system  parameter  is  the  maximun  number  of  keys  (in  a  boolean 
combination  of  keys)  on  which  any  single  search  may  be  conducted.   Boolean 
combinations  of  different  keys  will  be  allowed  as  search  arguments.   Searching 
may  be  done  on  part  of  keys,  e.g.,  the  last  name  in  a  full  name  key  or  can  be 
one  of  many  relational  forms.   For  example,  equal  to,  greater  than,  not  equal 
to  and  boolean  combinations  of  these  are  possible  searching  conditions.   Partial 
pattern  matches  could  be  performed  on  alphanumeric  data.   These  conditions 
will  be  represented  by  a  key  mask  to  accompany  each  key  element  being  used  as 
a  search  argument. 

Requests  to  the  DMS  will  be  translated  into  sequences  of  DMS  instruc- 
tions.  The  requests  are  viewed  as  a  high-level  language  program  that  must  be 
compiled  into  "DMS  machine  code"  (i.e.,  DMS  instructions)  to  drive  the  "DMS 
machine",  that  is,  a  "machine"  which  is  simulated  by  ILLIAC  IV  code.   The  re- 
quests will  come  in  some  form  of  key  element,  key  mask,  and  command  triples. 

Some  samples  are: 

1.  INSERT  KEY  =  X_DATA_ELEMENTS  WITH  A  KEY  =  X  AND  MASK  =  (...) 

2.  READ  RECORDS  WITH  KEY  -  AND  MASK  =(...)  OR  KEY  =  Z  AND 
MASK  =  (...) 
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3.   FIND  HOLE  WITH  LENGTH  =  <102h  BITS 
k.      WRITE  FILE  -  HOSPITAL  FROM  DISK  TO  LASER 
5.   COLLECT  GARBAGE 

When  a  match  is  located  on  a  read  request,  for  example,  the  record 
is  moved  from  the  l/o  buffer  into  a  processing  buffer  in  the  same  PE,  if  pos- 
sible  If  a  FE  finds  more  records  than  it  can  accommodate,  the  I/O  monitor 
is  responsible  for  any  necessary  packing  or  routing.   Pointers  are  returned 
to  the  calling  modules  to  indicate  where  retrieved  records  are  located. 

A  curious  result  of  this  system  is  that,  after  some  processing, 
files  that  are  currently  residing  on  the  dish  no  longer  have  any  physical 
resemblance  to  the  general  file  concept  of  concatenated  records  since  their 
records  are  scattered  randomly  throughout  their  residing  data  base  segment. 
This  happens  when  records  are  retrieved,  processed  (causing  an  expansion,  c 
traction,  or  update),  and  then  rewritten  on  the  dish.   The  write  request  has 
issued  a  search  for  the  first  hole  larger  than  the  record  which  could  be  any- 
where in  the  segment.   This  shuffling  of  records  from  files  of  the  same  data 
base  would  he  advantageous  for  simultaneous  multiple  file  processing.   Search- 
ing time  for  finding  two  complementing  records  from  different  files  would,  on 
the  average,  be  reduced. 

Although  records  from  different  files  from  the  same  data  base  are 
completely  shuffled  in  a  data  base  segment,  there  is  no  deterioration  expe- 
rienced by  the  system  and  there  are  no  additional  routines  required  to  un- 
scramble records  since  the  DMS  is  not  concerned  with  where,  who,  or  what  is 
on  the  disk  between  off-file  requests.   Thus,  two  or  more  users  may  he  simul- 

f.la  v.11+  nan  he  completely  unconcerned  with  the 
taneously  accessing  the  same  file,  hut  can  oe  (.ompx    x 

increased  file  activity  and  complexity  heing  experienced  hy  the  DMS. 

When  a  file  is  to  he  returned  to  the  laser  memory,  if  a  change  has 

a      i«  +v,a+  filp  are  brought  together  in  consecutive  disk 
taken  place,  records  in  that  me  are  uiuu&uu   & 

3     -p^v,  -f-vno  isqpr  file.   A  file  write  from 
hlocks.   The  blocks  become  records  for  the  laser  ine. 

ILLIAC  IV  disk  to  laser  is  then  requested. 
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C3-3  Garbage  Collection 

Disk  information  element  maintenance  will  primarily  be  in  the  form 
of  garbage  collection- -the  collapsing  of  small  hole  elements  into  larger  ones. 
It  is  desirable  to  collect  garbage,  often  without  the  user's  knowledge.   This 
avoids  deteriorating  performance  because  of  a  lack  of  usable  disk  space  due  to 
the  existence  of  several  hole  elements  too  small  to  be  of  use.   This  condition 
happens  because  whenever  a  write -a-record-element  request  is  executed,  the 
system  finds  the  first  hole  equal  to  or  larger  than  the  record  within  the  cor- 
responding data  base  segment.   If  the  hole  is  larger,  a  new  hole  is  created 
at  the  end  of  the  record  that  is  smaller  by  the  length  of  the  record.  Even- 
tually, smaller  and  smaller  holes  become  randomly  distributed  along  the  seg- 
ment.  (The  garbage  collection  process  is  shown  in  Figure  8.) 


Begin  reading  a  set  of  i/o  blocks 
into  buffer  from  data  base  segment 


Recopy  a  buffer,  leaving  out  holes, 
into  another  i/o  buffer 


Output  a  set  of  i/O  buffers. 


Garbage  Collection 
Figure  8 

This  tends  to  push  holes  to  the  "end"  of  the  data  base  segments  on  the  disk 
which  suggests  a  good  place  to  start  looking  for  holes.  The  entire  process 
can  be  completed  in  one  pass  over  the  segment. 
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C.3«^  Comments  on  Direct  Accessing 

If  a  direct  access  system  would  greatly  increase  efficiency  for  some 
special  problem,  then  in  one  pass  over  the  disk  (one  second  or  less)  a  temporary 
inverted  file  can  be  created.   This  file  would  tell  what  records  are  stored  in 
each  block  by  saving  the  key  values  of  the  records  in  a  table  along  with  all 
the  corresponding  addresses  to  these  records.   Addresses  are  denoted  by  the 
block  number,  PE  number,  and  displacement  from  beginning  of  the  block. 

The  main  arguments  against  the  general  use  of  the  direct  access  fac- 
ility described  above  is  that  tables  can  grow  very  fast  for  large  data  bases. 
These  tables  must  be  paged  in  and  out  adding  to  the  general  queue  of  I/O 
requests.   Furthermore,  single  direct  access  i/O  requests  are  seldom  experi- 
enced in  any  real  query  processing  since  we  are  generally  concerned  with  a 
set  of  records. 

It  is  not  efficient  to  process  single  elementary  questions  that  re- 
quire the  location  of  only  one  record.   Elementary  questions  are  of  the  form: 
Given  a  data  item  and  Its  value  (which  identifies  a  record  out  of  a  set  of 
records)  what  is  the  value  of  another  data  item  in  this  set  of  records?   In- 
stead, it  is  to  our  advantage  to  batch  the  many  elementary  questions  which  are 
needed  to  answer  any  translated  query,  and  retrieve  all  the  required  records 
on  one  data  base  segment  pass.   The  system  will  not  deteriorate  continuously 
as  the  number  of  elementary  questions  increases.   This  is  because  the  time  to 
process  one  batch  is  roughly  the  time  it  takes  to  pass  over  the  data  base  seg- 
ment --  l/lO  second  or  less  (which  is  adequate  for  interactive  purposes).   At 
some  threshold  sized  batch,  there  is  a  discrete  jump  requiring  two  passes. 
However,  for  a  Table  driven  system,  the  performance  is  approximately  continu- 
ously related  to  the  complexity  of  the  query,  i.e.,  the  number  of  elementary 
questions  generated.   This  is  partly  due  to  overhead  costs  for  table  mainte- 
nance due  to  repeated  accesses  to  tables,  searches  through  the  tables,  table 
maintenance  routines,  and  i/O  requests  associated  with  table  swapping  and 
paging.   This  is  not  a  linear  relationship;  procedure  times  increase  faster 
than  the  increasing  number  of  records  they  are  processing.   Also,  the  restricted 
queueing  of  I/O  requests  for  individual  records  allows  only  a  few  accesses  to  be 
performed  on  a  single  rotation  of  the  disk.   Therefore,  we  can  reasonably  ex- 
pect the  following  relationships  shown  in  Figure  9- 
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The  DMS  vs.  Direct  Access  Systems  Performances  on  Processing  Batches  of 

Elementary  Questions 

Figure  9 

For  most  hatching,  the  system  would  he  expected  to  operate  at  some 
number  of  questions  just  below  a  multiple  of  N  to  optimize  query  answer  times. 

C.3.5   Comments  on  the  DMS 

Figure  10  is  a  diagram  of  the  data  flow  and  control  between  various 
modules  of  the  DMS.   Further  analysis  is  required  to  determine  which  modules 
can  be  executed  efficiently  on  the  ILLIAC  IV  or  should  run  on  the  B6500. 

The  system  performance  monitor  module  exists  for  two  purposes. 
Firstly,  it  allows  simplified  system  development  by  localizing  the  control  ove? 
system  parameters.   Secondly,  it  provides  for  constant  system  evaluation  on  a  . 
production  level  and  can  suggest  parameter  value  changes  for  different  user 
requirements. 


37 


i«fe/ 


0  Monitor 


DMS 
Requests 


Interpreter 

and/ or 

Compiler 


3a£es^X 
| Cards 


Preprocessing 
Module 


Off  disk  file  ♦, 


<~  Maintenance 


Ik 

Laser 
Memory 


£ 


i7o 

J  Subsystem 


Performance  Monitor 


DMS 
Control 


Information  flow- 
Direction  of  control 


The  Control  and  Information  Flow  Between 
The  Various  Modules  of  the  DMS 
Figure  10 


Some  general  statements  can  be  made  about  the  advantages  of  this 
system  in  comparison  to  other  schemes.   Garbage  collection  is  not  time  con- 
suming nor  is  the  associated  code  long.   It  does  not  expose  files  to  loss  due 
to  recopying  --  associated  with  varying  types  of  file  maintenance  routines. 
There  is  no  need  to  segment  large  records  into  overflow  areas.   The  perfor- 
mance does  not  deterioriate  continuously  as  data  activity  increases.   Vast 
assortments  of  large  tables  are  not  needed  nor  are  the  many  different  asso- 
ciated table  maintenance  routines.   The  system  is  conceptually  simple  which 
eliminates  anxiety  for  users  as  well  as  for  design  implementors.  Most  important 
is  that  it  provides  a  conceptually  solid,  yet  simple  foundation  for  more  sophis- 
ticated levels  of  software  to  be  built  into  the  system. 


38 


Appendix  D:  A  Suggested  Design  for  the  Symbolic  Data  Structuring  System  (SPSS) 
D.l  Responsibilities  of  the  SPSS 

The  following  responsibilities  are  assumed  by  the  SDSS: 

1.  The  creation  and  maintenance  of  the  structure  of  each  record,  i.e., 
the  record  format. 

2.  The  input,  retrieval,  deletion,  and  testing  of  arbitrary  data  items 
within  records. 

3.  The  specification  of  any  data  items  as  keys. 
k.      Provide  file  security. 

5.  Provide  minimum  risk  for  information  loss  when  restructuring  or  recopying 

files. 

6.  Allow  for  all  forms  of  data  and  their  various  attributes. 

7.  Maintain  sufficiently  descriptive  information  on  all  data. 

8.  Allow  for  naming  and  retention  of  user  defined  "data  characteristics" 
as  boolean  combinations  of  relational  conditions  on  data  items. 

D.2   The  Entities  of  Data  Structuring 

This  system  is  designed  to  deal  with  the  following  entities: 

1.  Fields 

2.  Sub fields 

3.  Elementary  Data  Items  , 
h.  Group  Data  Items  , 

5.  Records 

6.  Record  Formats 

7.  Keys 

8.  Key  Formats 

9.  Files 

10.  Data  Bases 

At  the  lowest  level  are  elementary  data  items  which  are  names  of  the 
least  divisible  pieces  of  data,  such  as  EMPLOYEE  NO,  MONTH  OF  EMPLOYMENT,  and 
DAY  OF  EMPLOYMENT.   A  group  data  item  is  a  name  associated  with  several  ele- 
mentary data  items.   Group  data  items  may  also  be  names  for  a  collection  of 
other  group  data  items.   For  example,  DATE  OF  EMPLOYMENT  is  a  name  for  the 
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.roup  of  elementary  data  items  MONTH  OF  EMPLOYMENT,  DAY  OF  EMPLOYMENT,  and 
fEAR  OF  EMPLOYMENT.   EMPLOYEE  RECORD  is  a  group  data  item  which  includes  sev- 
3ral  other  group  data  items.   When  it  is  not  necessary  to  distinguish  between 
elementary  and  group  data  items,  the  term  data  items  will  be  used.   A  record 
is  any  group  of  related  data  items  that  is  structured  in  a  hierarchical  fashion, 
This  structuring  yields  a  tree  data  structure  for  records  where  each  node  rep- 
resents a  data  item;  the  leaves  represent  elementary  data  items.   Figure  11 
is  an  example  of  the  tree  structure  of  a  record. 
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Any  data  item  may  be  used  as  a  key.   A  key  in  this  context  is  any 
data  item  which  is  used  to  identify  other  data  items.   The  value  of  a  key 
identifies  a  particular  subset  of  records  out  of  a  set  of  records.   The  DMS 
structured  the  input  data  so  that  each  record  can  be  accessed  by  an  arbitrary 
number  of  keys  as  decided  upon  by  the  people  who  are  generating  the  data  base. 
For  example,  a  man's  patient  number  may  identify  a  whole  group  of  information 
about  him.   This  group  could  include  name,  address,  insurance  policies,  drug 
types,  and  others.   There  may  be  several  keys  by  which  one  wants  to  find  his 
data.   One  would  certainly  want  to  find  it  by  the  man's  name.   It  is  also 
possible  that  this  data  might  be  located  by  his  illness  (but  perhaps  he  has 
several).   This  last  example  is  indicative  of  a  more  general  form  of  key,  i.e., 
a  key  which  has  several  values  per  record  in  contrast  with  a  key  that  has  only 
one— such  as  the  man's  name.   This  general  type  of  key  presents  no  additional 
burdens  on  the  IMS. 

It  will  not  be  required  for  the  user  to  know  which  data  elements  are 
keys.   He  will  be  able  to  specify  any  data  element  as  a  search  argument  when 
looking  for  the  value  of  an  associated  data  item.   However,  in  structuring  the 
original  records,  the  user  should  choose  keys  in  a  manner  which  he  feels  would 
be  the  most  efficient  structure  for  his  data  with  respect  to  searching.   For 
frequent  users,  it  would  be  beneficial  for  them  to  be  aware  that  a  generalized 
search  on  a  data  item  which  is  not  indicated  to  be  a  key  could  be  relatively   j 
expensive.   This  is  true  because  every  record  in  every  file  containing  this 
data  item  would  have  to  be  examined  to  determine  the  match.   The  DMS  will  per-; 
mit  several  keys  to  be  associated  with  a  record.   A  file  is  then  defined  as 
all  the  data  item  values  of  all  the  records  that  correspond  to  a  single  set  of; 
associated  keys. 

A  data  base  is  defined  as  an  associated  set  of  files.   Each  class  of 
users  can  maintain  one  data  base.   The  IMS  then  can  maintain  several  data  base 
for  different  classes  of  users.  The  lattice-tree  structure  of  the  IMS  is  shown 
in  Figure  12.   Any  further  cross  referencing  of  nodes  is  accomplished  through 
elementary  data  items  whose  values  have  the  data  item-type  link.   This  is 
elaborated  on  in  Appendix  E. 
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Physically,  the  data  structure  of  group  and  elementary  data  items 
are  represented  by  fields  and  subfields,  respectively.   These  are  the  actual 
groups  of  "bits  that  contain  the  values  of  data  items.   The  slicing  of  the  key 
elements  and  record  elements  into  fields  and  subfields  is  specified  "by  key  and 
record  formats.   An  example  of  a  key  and  record  format  is  shown  in  Figure  13- 
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00  FILE_NAME   IS   PATTENT_FILE 
01     PATIENT_RECORD 
02   NAME   KEY 

03  FIRST_NAME   CHARACTER    (VARIABLE) 
03  LASTJTAME   CHARACTER    (VARIABLE) 
02   PATIENT_N0  INTEGER    (9)    KEY 
02   LOCATION 

03  BUILDING_NO  INTEGER    (2) 
03  ROOM_NO  INTEGER    (3) 
02   ILLNESS  REPETITIVE 

03  ILLNESS_NAME   CHARACTER    (15)   KEY 
03   DRUG  REPETITIVE 

Ok  DRUG_NAME   CHARACTER   (15) 

Ok  DRUG_FREQUENCY   CHARACTER    (l) 

Example  Key  and  Format  Declaration 
Figure   13 

Note  the  use   of  the   data  item  types,   REPETITIVE   and  VARIABLE. 
Repetitive   data  items  are  those  that   exist   in  a  variable  number   for  each 
record  but   each  repetition  has  the   same   structure.      Variable  data  items  are 
those  that  have  variable   lengths.      Included  as  variable   length  data  are  those 
items  that   don't  always  have  values  for  each  record.      Notice   also,    that   any 
element   containing  a  variable  or  repetitive  element   is,    in  turn,   variable. 
These  features  allow  for  efficient  packing  of  data  into  records  by  avoiding 
many  null  entries   caused  by  allowing  only  fixed  formatting   of  records. 
ILLIAC  IV  processing   speed  easily  warrants  this   flexibility.      There   is  plenty 
of  time  to  pack,   unpack,    and  pack  again. 
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D.3  The  Data  Item  Symbol  Tables 

The  structure  and  names  of  all  data  bases,  files,  keys  records,  and 
data  items  will  be  found  in  a  group  of  Symbol  Tables.   The  IMS  has  a  Symbol 
Table  that  names  all  the  data  bases  and  each  data  base  in  turn  has  a  Symbol 
Table  (i.e.,  each  class  of  users  has  a  Symbol  Table).   The  IMS  Symbol  Table 
location  is  always  known  to  the  IMS  system.   It  allows  for  the  boot -strap 
location  of  a  users'  Symbol  Table  which  is  then  placed  in  core.   Each  user 
Symbol  Table  is  a  manageable  file  and  thus  can  be  manipulated  by  the  EMS, 
i.e.,  it  can  be  modified  with  the  same  set  of  instructions  as  those  used  to 
modify  other  files.   The  structure  of  the  Symbol  Tables  is  known  to  the  system 
by  referencing  the  IMS  Symbol  Table  while  the  structure  of  user  data  is  known 
by  referencing  a  Symbol  Table. 

An  important  feature  of  the  system  is  that  a  modification  of  the 
Symbol  Table  implies  that  the  structure  of  a  file  in  the  system  has  been 
changed.   Changes  in  a  Symbol  Table  force  file  modifications  to  take  place. 
This  means  the  Control  Module  of  the  SDSS  will  detect  the  changes  to  the 
Symbol  Table,  i.e.,  entries,  deletions,  corrections,  updatings,  etc.,  and  will 
call  the  proper  algorithms  to  carry  out  the  implied  modifications. 

Every  subfield  (data  item)  of  a  user's  data  is  described  in  a  Symbol 
Table.   Some  of  the  entries  indicate  the  length  of  the  subfield,  position  in 
the  data  tree,  type  and  attributes.   Among  other  things,  the  character  (alpha- 
betic) name  of  the  data  item  is  present  so  that  once  a  data  item's  subfield 
has  been  recorded  it  can  thereafter  be  referenced  by  name.   In  this  fashion, 
the  user  will  never  have  to  become  "bogged  down"  with  the  computer  jargon  for 
field  locations. 

The  table  also  allows  the  system  to  be  aware  of  keys  and  data  items 
which  are  associated  with  more  than  one  file.   For  example,  an  EMPLOYEE  NO. 
might  be  a  key  for  an  insurance  file  as  well  as  a  salary  file.   If  one  wanted 
to  find  the  average  insurance  premium  for  persons  of  a  certain  salary,  one 
would  search  the  salary  file  for  the  salary  range,  extract  the  associated 
employee  number,  and  then  search  the  insurance  file  with  the  employee  number 
to  find  the  premium.   Notice,  that  since  the  information  is  present  in  the 
symbol  table,  the  system  can  make  this  cross  file  matching  without  the  users 
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having  to  specify  which  files  are  needed.   There  was  no  need  to  mention  an 
explicit  sort  and  the  related  recopying  which  can  monopolize  time.   This  is 
because  there  is  no  need  to  order  records  on  the  disk. 

Since  a  Symbol  Table  controls  all  structuring,  easy  and  accurate 
file  structure  manipulations  can  be  coded.   Some  changes  require  only  the 
update  of  the  pointers  within  each  record  and  need  no  recopying.   If  a  copy 
is  required,  it  will  be  forced  by  a  system  call  to  the  proper  set  of  algo- 
rithms which,  once  debugged,  will  reduce  risks  of  incorrect  copying  and  losses 
of  information  for  all  users.   Structuring  of  a  new  file  will  be  done  by  in- 
dicating to  the  Symbol  Table  which  data  items  are  to  be  filled  from  old  data 
and  which  from  new  data. 

Since  the  Symbol  Table  is,  itself,  a  managed  file,  it  must  also 
have  a  tree  structure.   Its  structure  is  defined  explicitly  in  the  IMS  Symbol 
Table.   To  change  the  structure  of  the  Symbol  Table,  one  only  has  to  change 
the  IMS  Symbol  Table.   For  example,  perhaps  a  new  subfield  would  be  useful  in 
the  Symbol  tables.   It  would  be  entered  by  name  into  the  IMS  Symbol  Table 
which  forces  a  change  in  the  other  Symbol  Tables.   Such  a  change  in  the  Symbol 
Table  may  cause  changes  in  the  record  structures.   These  changes  should  be 
carefully  thought  out,  since  every  record  in  the  IMS  might  be  changed  to  carry 
out  the  restructuring.   Although  it's  not  a  conceptually  difficult  problem,  it 
could  be  very  expensive  in  terms  of  time.   The  following  is  a  list  of  the 
effects  of  changing  various  Symbol  Tables. 

1)  The  structure  of  the  IMS  Symbol  Table  never  changes 
because  it  is  fixed  by  the  designer  of  the  IMS. 

2)  Changes  to,  or  additions  of,  entries  to  the  IMS  Symbol 
Table  cause  changes  to  the  structure  of  the  user  Symbol 
Tables. 

3)  Changes  or  additions  to  entries  of  a  user  Symbol  Table 
cause  changes  to  the  structures  of  the  records  in 
respective  files  of  that  user's  data  base. 

h)      If  type  (2)  changes  cause  type  (3)  changes,  then  every 
record  in  every  file  of  every  data  base  in  the  IMS  is 
changed. 
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The  following  is  a  description  of  each  of  the  subfields  in  the  IMS 
Symbol  Table: 

Name 

This  is  a  reference  number  for  each  table  subfield  which  represents 
a  column  definition  for  user  Symbol  Tables  or  is  the  alphabetic  name  of  a 
data  base  that  exists  in  the  IMS. 

Entry  Number 

This  is  a  reference  number  for  each  table  entry  which  is  actually 
the  subscript  displacement  from  the  beginning  of  the  table. 

Size 

This  reserves  a  certain  number  of  bytes  for   column  definitions.      If 
the  name   is  a  data  base  name,   the   current  number  of  entries   in  that  data  base's 
Symbol  Table   are    "sized".      If  the  name   is  IMS,   the   size   is  the  number  of  cur- 
rent  data  bases   in  the   system.      If  the  name   is   SYMBOL_TABIE,    size   is  the  num- 
ber of  columns   in  a  Symbol  Table.      If  the  name   is  IMS_SYMBOL_TABLE,    size   is 
the  number  of  entries   in  the  IMS  Symbol  Table. 

Parent,  Level,  Rank  Among  Sisters 

All  define  the  structural  relationship  of  the  column  definition 
entries  for  the  user  Symbol  Tables  which  then  allow  the  SDSS  routines  to  be 
used  to  restructure  and  make  entries  or  deletions  to  the  user  Symbol  Tables, 
as  if  they  were  files  like  any  other  files. 

Base  Address 

If  the  name  is  a  data  base  name,  then  this  is  the  address  of  the 
base  of  the  Symbol  Table  for  this  data  base  if  it  is  in  the  ILLIAC  IV  memory, 

otherwise  it  is  zero. 

1 

Figures  Ik   through  20  show  examples  of  Symbol  Tables,  resulting  key 
and  record  element  formats,  and  associated  data  structures. 

A  packing  scheme  is  used  to  reduce  the  number  of  pointers  in  a 
record  format  from  having  a  pointer  for  every  field.   At  each  node  of  the  tree 
the  first  entries  are  pointers  to  each  variable  length  daughter  node  and  the 
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"beginning  daughter  node  of  each  sequence  of  repetitive  nodes;  next  follow  all 
the  fixed  length  items  followed  by  the  variable  and  repetitive  items.   Terminal 
nodes  have  no  pointers  since  they  would  be  null.   In  this  scheme,  there  is  one 
pointer  for  each  occurrence  of  a  variable  or  repetitive  node. 

The  author  understands  that  the  packing  scheme  used  in  the  examples 
as  defined  by  the  Symbol  Table  field  definitions  is  not  the  most  efficient  -- 
with  respect  to  the  number  of  pointers  required  per  record  format.   There 
exist  trade-offs  between  adding  more  descriptor  fields  to  the  Symbol  Tables, 
the  number  of  pointers  required  in  record  formats,  and  the  complexity  of  the 
tree  climbing  and  record  format  packing  algorithms. 

The  following  is  a  description  of  each  of  the  subfields  in  the  Symbol 
Table  that  has  been  established  by  the  IMS  Symbol  Table. 


Name 


The   alphabetic  name  of  the   data  item  whose   structural 
relationship  with  other   data  items   is  being  described. 
In  our  examples,    some  of  these  have  been  PATIENT_NO, 
ILINESS,    and  INSURANCE. 


Entry  Number 


Size 


Repeat 


The  reference  number  for  each  table  entry  is  the  actual 
subscript  displacement  from  the  beginning  of  the  table. 

The  length  (in  bits  or  bytes)  for  a  fixed-length  item. 
If  the  data  items  in  question  are  variable  length,  such 
as  a  description,  or  if  it  is  a  repetitive  data  item, 
this  entry  is  zero.   Notice,  any  field  which  has  a 
repetitive  or  variable  length  subfield  is,  therefore, 
itself,  of  variable  length. 


A  one  bit  indication  as  to  whether  the  entry  is  a 
repetitive  data  item. 
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00  FILEJJAME  IS  PATIENT_FILE 
01  PAT1ENT_REC0RD 
02  NAME 

03  LAST_NAME  CHARACTER   (15 ) 
03  FTRSTNAME  CHARACTER   (15 ) 
02  PATIENT_N0  INTEGER   (7)   KEY 
02  ILLNESS  REPETITIVE 
03  ILLNESS_NAME 
03  DRUG_NAME  CHARACTER   (15 ) 
00  FILE_NAME  IS  INSURANCE_FILE 
01  PATIENT_RECORD 

02   PATIENT_N0  INTEGER   (7)   KEY 
02  INSURANCE 

03  POLICIES  REPETITIVE 

OU  INSTITUTION_NAME  CHARACTER    (15) 
04  COVERAGE_TZPE  INTEGER   (2) 
02   TOTAL_PREMIUMS  DECIMAL   (7,2) 


Example  File  and  Record  Format  for  the  X_HOSPITAL_DATA_BASE 

Figure  16 
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The  Data  Structure  Defined  "by  the  Symbol  Table 

Figure  18 
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Level 


Parent 


Access 


Variable 


Thread 


Sorted 


The  level  of  this  data  item  in  the  tree  structure. 


The  entry  number  in  the  table  of  the  parent  data  item 
for  this  entry.   If  the  entry  is  a  file  name,  this  points 
to  his  parent  (the  data  base  name  in  the  IMS  Symbol  Table) 


Contains  security  code  information  to  protect  the  data 
from  misuse  by  unauthorized  users. 


A  one -bit  indication  that  tell  whether  the  entry  is  a 
variable  data  item. 


The  entry  number  in  the  table  of  the  first  daughter  of 
the  data  item.   If  there  are  no  daughters  for  this  entry, 
the  pointer  will  point  to  the  next  sister  at  the  same 
level.   If  there  are  no  sisters,  it  points  to  the  next 
sister  of  the  parent.   If  there  is  no  such  item,  it  will 
point  to  the  next  sister  up  another  level.   This  entry 
allows  simple  traversal  of  the  data  tree. 


A  one  bit  indication  as  to  whether  or  not  the  repetitive 
data  item  is  sorted. 


Displacement 


The  byte  displacement  from  the  base  of  the  parent  field 
in  the  record  format  for  a  fixed-length  entry  or  the  dis- 
placement from  the  base  of  the  parent  field  for  the  entry 
if  this  is  a  variable-length • or  repetitive  entry. 
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Number  of  Pointers 


The  number  of  pointers  at  the  beginning  of  this  entry's 
record  format  field.   There  are  pointers  to  each  variable 
or  repetitive  subfields.   The  list  of  pointers  is  the  first 
thing  to  appear  in  the  field.   All  pointers  are  fixed-length. 

Rank  Among  Sisters 

A  number  which  indicates  how  many  sisters  are  to  the  left 
of  this  data  item.   The  purpose  again  is  for  touring  the 
tree. 

Packing  Order 

The  rank  among  sisters  as  they  appear  in  the  second  formats, 
i.e.,  fixed-length  items  are  shifted  forward  in  their  fields. 

Disk  Presence 

A  one-bit  indication  that  tells,  if  the  item  is  a  file, 
whether  it  is  now,  presently  on  the  ILLIAC  IV  disk. 

Cross  Reference 

A  circular  linked  list  by  entry  number  if  this  data 
item  has  the  same  name  as  another.   These  data  items 
may  be  distinguished  by  their  parents. 

A  code  representing  the  physical  units  of  the  stored  data, 
e.g.,  feet,  miles,  or  years. 

A  one  bit  code  that  indicates  if  a  data  item  is  being 
used  as  a  key. 


Units 


Key 


Type 


A  code  representing  the  data  type,  e.g.,  integer,  real, 
symbolic,  link,  vector,  matrix,  or  contextual. 
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Attribute 

A  code  representing  more  information  about  the  data  type, 
e.g.,  matrix  may  be  diagonal  or  real.  Decimals  may  have 
the  decimal  point  in  a  specific  location. 

D.k     Data  Types  and  Attributes  and  Characteristic  Definitions 

The  system  will  be  able  to  process  any  form  of  data.   Along  with 
the  usual  numeric  and  character  types,  the  system  will  provide  for  the  fol- 
lowing data  types  of  data  items: 

1)  vectors 

2)  matrices 

3)  symbolic  coded 
k)  English  text 
5)  relational  links 

Those  data  items  requiring  further  explanations  will  be  used  as  keys  to  ref- 
erence a  descriptor  file.   This  file  will  contain  detailed  English  text  de- 
scriptions of  the  data  item's  semantic  definition.   Also  entered  in  the  descr: 
tor  file  are  the  descriptions  of  various  attributes  of  the  data  item.   For 
example,  in  the  NARIS  system,  the  data  item  SOIL_TYFE  may  have  the  value,  A, 
which  certainly  requires  further  explanation.   There  may  exist  several  levels' 
of  detail  for  these  descriptions  since  the  descriptor  file  is  a  hierarchical 
file  like  any  other.   The  report  generator  may  use  this  file  to  generate  its 
reports  at  several  levels  of  detail.   Explanations  of  data  items  may  condi- 
tionally be  put  out  with  the  report.   New  users  of  a  data  base  may  interactiv 
use  and  learn  the  system  while  at  the  same  time  becoming  familiar  with  the 
various  data  items. 

The   relational  link  data  type  is  used  to  cross-reference  different 
nodes  within  a  tree,  between  trees  of  the  same  file  or  across  files.   This 
allows  complete  flexibility  for  performing  complicated  searches  on  a  data  ba£ 
For  instance,  in  a  natural  resource  data  base,  such  as  NARIS,  which  maintain; 
data  on  a  ko   acre  tract  basis,  it  would  greatly  aid  search  and  trace  routine: 
to  link  together  all  the  tracts  through  which  a  river  flows.   Tag  bits  can  b< 
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declared  to  identify  the  type  of  link  being  made  --  to  differentiate  it  from 
other  links  --  and  the  descriptor  file  can  maintain  an  English  explanation  of 
the  relation  associated  with  a  particular  link. 

It  has  been  found  that  ILLIAC  IV  is  an  efficient  tool  for  contextual 
pattern  matching.   This  may  allow  simple  content  analysis  to  be  performed  on 
English  textual  data  for  limited  document  retrieval  purposes,  although  this 
'system  is  not  explicitly  designed  for  such  activities.   Pattern  matching  also 
allows  the  use  of  a  thesaurus  file  so  that  equivalent  data  item  names  and  values 
may  be  used  in  place  of  each  other.   For  example,  WHITE  could  be  used  equiva- 
lent ly  as  CAUCASIAN  for  the  value  of  a  RACE  data  item.   A  search  on  either  key 
value  would  return  the  same  records. 

Another  use  of  efficient  pattern  matching  is  the  ability  to  allow 
partial  pattern  matches  as  a  valid  data  item  value  identification.   For  example, 
suppose  an  inquirer  of  a  mental  health  data  base  wants  to  ask  a  question  about 
the  distribution  of  patients  among  ethnic  groups.   The  ethnic  group  of  a  patient 
nay  be  stored  as  SPANISH  AMERICAN.   Suppose  the  researcher  wanted  to  ask  a  ques- 
tion about  SPANISH  patients.   He  could  then  use  a  pattern  matching  operator  if 
ie  wanted  to  admit  the  information  contained  in  records  that  had  the  word 
3PANISH  as  any  part  of  the  ethnic  group.  -He  would  not  have  to  know  all  the 
different  combinations  of  SPANISH  heritage  to  specify  his  query. 

In  order  to  avoid  writing  the  same  specification  for  a  frequently 
ised  boolean  combination  of  relations  on  data,  the  notion  of  a  data  charac- 
teristic will  be  incorporated.   A  characteristic  is  a  name  associated  with  a 
'specific  boolean  combination  of  data  items  and  other  characteristics  whose 
j/alues  have  been  restricted.   For  example,  the  characteristics  FERTILE  and 
:ORN-LAND  may  be  defined  as  follows: 

DEFINE  THE  CHARACTERISTIC  FERTILE  TO  BE  SOILJJTPE  A  AND  SOIL_DEPTH 
10  INCHES  OR  S0IL_TYPE  B  AND  SOIL  SLOPE  LESS  THAN  10  DEGREES. 

DEFINE  THE  CHARACTERISTIC  C0RN_LAND  TO  BE  FERTILE  AND  S0IL_ACREAGE 
GREATER  THAN  OR  EQUAL  TO  10  ACRES. 

Researchers  may  build  up  very  complicated  characteristics  and  never  have  to 

i 

be  concerned  with  the  details  of  the  definitions  again.   The  library  file  will 
Maintain  these  definitions  so  that  they  can  be  used  wherever  data  item  names 
ire  employed.   Further  information  on  these  characteristics  can  be  found  by 
lame  in  the  descriptor  file. 
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The  user  should  also  be  allowed  to  match  names  to  questions  that  he 
would  like  to  ask  again  --  hut  would  not  like  to  remember  the  entire  language 
body  of  the  request.   These  named  routines  may  also  be  kept  in  the  library 
file. 

Any  generalized  information  management  system  cannot  foresee  all 
the  idiosyncracies  of  several  users  and  their  different  data  bases.   It  would 
be  very  valuable  to  provide  the  ability  for  users  to  program,  in  a  high-level 
programming  language  (e.g.,  GLYPMR  or  FORTRAN),  routines  tailored  for  their 
own  purposes  to  operate  on  retrieved  data.   These  routines  can  be  named  and 
stored  in  the  library  file  for  future  reference. 


Figure  21  is  a  diagram  of  the  control  and  information  flow  for  the 


SDSS. 
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Appendix  E:   A  Suggested  Design  for  the  Information  Retrieval  System  (IRS) 
E.l  Responsibilities  of  the  IRS 

1)  Provide  a  general  search  and  control  language  (SCL) 
into  which  a  set  of  data-base -dependent  question- 
answering  languages  (QAL)  can  be  compiled. 

2)  Provide,  as  part  of  each  data-base -dependent  QAL, 
a  report  generation  facility. 

3)  Allow  SCL  to  communicate  with  and  control  the  Mathe- 
matical Computation  System  (MCS) . 

10   Provide  for  multi-user  simultaneous  query  processing 
allowing  an  interactive  mode. 

E.2  The  IRS  Description 

Figure  22  is  a  diagram  of  the  control  and  information  flow  for  the 
IRS.   Several  discrete  languages  have  been  suggested  for  conceptual  purposes, 
The  main  language  will  be  the  SCL  language;  each  QAL  could  be  procedure  names 
and  associated  parameters  attached  to  segments  of  SCL  code.   Each  class  of 
users  could  then  build  his  own  language.   The  reason  for  the  separation  is 
that  question-answering  languages  for  different  data  bases  would  inherent^  be, 
different.   A  QAL  for  a  natural  resource  inventory  system  would  be  geograph- 
ically oriented;  questions  would  have  a  two-dimensional  quality.   However,  the 
QAL  for  a  mental  patient  system  would  be  inclined  to  look  for  statistical  re- i 
lationships  between  the  effects  of  several  data  items  (variables)  on  other 
data  items.   The  same  philosophy  is  employed  for  the  implementation  of  user 
oriented  report  generators.   However,  the  emphasis  would  be  upon  the  subset  o: 
report  structures  that  would  exist  for  all  users. 

The  language  must  allow  for  conditional  forms  of  instructions  so  . 
that  different  actions  may  be  taken,  depending  on  the  outcome  of  some  mathe- 
matical routine  or  the  value  of  a  data  item.  Boolean  combinations  of  charac- 
teristics could  be  used  to  define  the  scope  (i.e.,  the  subset  of  records)  ove 
which  verb  declarations  can  act.  Verb  forms  would  exist  in  classes  that  coin 
cide  with  their  subsystem  purposes.  For  example  ,  GRAPH  pertains  to  the  repor 
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generator;   FIND  relates  to  the  IRS;    CORRELATE   is  associated  with  the  Statis- 
tical System.      The   syntax  for  the  languages   in  under   study.      Our  direct  link 
to  ARPA-network  members  with  experience   in  this  area,    such  as  Stanford  Univer- 
sity, MIT,   and  BM,    should  be  of  great  help.      We   can  experiment  with  their 
languages  to  determine  a  useful   syntax. 

The  time-sharing  and  interactive   capabilities  are  limited  by  the 
first  version  of  the  operating   system.      The   first  version  of  the  ILLIAC  IV  IMS 
will  have  to  provide  these  facilities   in  a  round-about  way.      The   sequence  of 
events  will  be  as  follows: 

1)  Load  the  IMS; 

2)  Batch  and  queue  several  independent  queries; 

3)  Process  each  query  to  their  first  report  stage; 
k)     Report  results,  but  save  all  intermediate  files 

and  variable  values; 

5)  Release  the  IMS  for  other  uses; 

6)  After  a  short  time  (on  the  order  of  a  few  seconds) 
return  to  step  1. 

The  batching  facility  and  re-initiaticn  of  the  system  en  a  cyclic  basis  will 
simulate  a  time  sharing  system;  the  retention  of  all  intermediate  results  wil. 
provide  an  interactive  capability.   The  entire  sequence,  (l)  through  (6),    | 
should  only  be  on  the  order  of  a  few  seconds. 


E.3  interaction  with  the  Mathematical  Computation  System  (MCS) 

The  essence  of  the  MS  is  its  ability  to  mathematically  operate  on 
several  large  independent  data  bases.   The  ILLTAC  IV  computer  provides  the 
capability  for  direct  high  speed  processing  of  several  files  located  simulta- 
neously on  its  disk.  Very  intricate  cross  file  searches  can  retrieve  data  arl 
pass  it  to  extensive  computation  systems  residing  in  the  MCS.   In  a  matter  of 
seconds,  signified  results  can  be  reported  to  researchers  where  such  opera- 
tions  would  normally  take  hours  (if  they  were  economically  feasible  at  all.  ) 
References  to  these  computational  systems  will  be  embedded  within 
the  SCL  language,  providing  consistent  accessibility  to  widely  different 
systems.   An  hierarchical  "language-machine"  approach  for  complex  subsystems 
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design  is  suggested  to  provide  this  systematic  interactive  ability.   A  by- 
product of  this  design  is  the  ability  for  the  computation  systems  to  use  the 
facilities  of  the  IMS  to  maintain  intermediate  files.   Thus,  these  systems 
nay  be  developed  without  being  concerned  with  details  of  complex  data  manage- 
ment.  This  considerably  lessens  development  costs  for  the  individual  sub- 
systems. 

This  approach  is  just  an  extension  of  the  methods  behind  the  phi- 
losophy of  designing  high  level  languages  to  run  "high  level  language  ma- 
chines".  These  "machine"  are  then  simulated  by  machine  instructions  on  the 
computer  doing  the  simulation.   This  allows  programmers,  for  example,  to 
/iew  a  PL1  program  as  code  written  for  a  "PL1  machine".   An  hierarchy  is 
sstablished  because  the  PL1  program  may  be  compiled  into  an  assembly  language 
tfhich  views  its  machine  as  an  "assembly  language  machine".   This  code  is  then 
assembled  into  hard  machine  code.   Figure  23  illustrates  the  process. 
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This  structure  applied  to  subsystem  design  is  shown  in  the  next 
diagram,  Figure  2k.      Essentially  it  is  exactly  the  same  as  the  previous 
example  except  that  "machines"  are  considered  to  be  relevant  subsystems  and 
that  any  subsystem  can  generate  programs  for  other  subsystems. 
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Hierarchy  Language  Control  of  Subsystems 
Figure  2k 


Essentially,  each  program- system  is  a  collection  of  subroutines  wit 
in  a  driver  subroutine  which  accepts  long  sequences  of  parameters  (i.e.,  a 
language),  executes  them,  and  can  generate  requests  to  other  subsystems.  The* 
is  no  reason  why  a  lower  level  system  cannot  call  a  higher  system  as  long  as 
the  action  does  not  cause  an  infinite  loop. 
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The  following  diagram,  Figure  25,  shows  how  the  language -sub system 
approach  applies  to  the  IMS  and  MCS  subsystems. 
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The  Language  Subsystem  Approach  Applied  to  the  IMS  and  MCS  Systems 

Figure  25 


An  arrow  implies  a  compilation  step  which  is  simply  a  conditional 
call  of  the  subroutines  in  the  next  system. 
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Reiterating,  this  modular  approach  allows  simple  coordination  of 
the  total  system  with  a  uniform  language  development.   It  provides  for  inde- 
pendent (as  far  as  the  user  is  concerned)  use  of  the  different  subsystems. 
Changes  may  be  made  to  a  subsystem  without  necessarily  causing  code  changes 
in  other  systems.   There  is  a  disadvantage  of  having  several  compiler  inter- 
pretation steps,  but  this  is  offset  by  the  processing  power  of  the  ILUAC  TV 
system.   Besides,  the  design  is  conceptually  very  simple,  allowing  easier 
development,  implementation,  and  debugging. 
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